Milestone 1
12 Oct 2021Question 1
1.1
Screenshot:
Explanation:
Issues:
Highest SV% players have very low shots against (SA), so the sample size is very small. Larger sample sizes (larger SA) could provide more accurate SV%, and provide a smaller margin of error.
Fix issue:
We can only consider players who have at least the average number of SA, and then sort by SV%.
1.2
Screenshot:
1.3
In determining a goalie’s performance, other features that could potentially be useful may be Shots Against (SA), Goals-Against Average (GAA), Goalie Point Shares (GPS), and Games Played (GP) and Won (W).
Having a high SA would mean a larger sample size for SV%. Larger sample sizes (larger SA) could provide more accurate SV%, and provide a smaller margin of error.
GAA calculates the number of goals allowed per 60 minutes played. So, the lower GAA, the better performance of the goalie.
GPS is an estimate of the number of points contributed by a player due to his play in goal. So beyond only metrics about the goals a goalie saved, this metric shows how many points they were able to help score. The higher GPA, the better performance of the goalie.
At the end of the day, what matters after a game is winning. A goalie does a lot more to impact a game than what appears only in his post-game stats such as stopping “dump-ins”. Therefore, a higher win percentage by calculating W/GP could also be potentially useful in determining a goalie’s performance..
Question 2
First, we figured out the Game ID naming rules based on the API. Then, we tested the IDs and printed out their corresponding JSON data on the terminal just to see.

Next, in our Python code, we loop through years, and regular season and playoffs to get every single game ID. We get each game’s JSON data from the statsapi webpage for each game ID and save the JSON data locally. The downloaded JSON files are saved in this file structure:
JSON_data
│
└───regular_seasons
│ └───2016
│ │ 2016020001.json
│ │ ...
│
└───playoffs
│ └───2016
│ │ 2016030111.json
│ | ...
Question 3
Screenshot:
Explanation:
In this debugging tool, the user can select the season from all year options, and select game type between regular_season and playoffs. the game_number slider allows the user to select which particular game of the season, and the eventIdx slider selects the event number.
The diagram will then display the coordinates (large blue dot) for where the event happened.
We used matplotlib to draw the coordinates and to add the background image, and used ipwidgets to add the interactive functionality.
Question 4
4.1

4.2
We know that the strength of players on the ice starts off being even. So starting off, every event will be at even strength (5 on 5) until a penalty occurs. When a penalty occurs, the strength will change. The team whose player received a penalty will go to the penalty box leaving his team short handed and giving the other team a power play. This changes the strength of the players to 5 on 4. A penalty will last a certain number of minutes depending on the type of penalty. So, the strength of the players will change back to even automatically once the penalty expires, and we can check the game time of every event to know when to change the player strength back. Another possibility is if the team that has a power play scores a goal. Then, the strength will also change back to even.
4.3
In hockey, a rebound occurs when a puck gets shot and bounces off the frame of the goal post. We can speculate that shots or goals could have came from a rebound by paying attention to the time this event occurred and the previous event that occurred. If the previous event was a shot from the same team and happened within maybe 2 seconds, it is likely that the current shot came off of a rebound.
According to the youtube video linked in the instructions, a play off the rush appears to be when a single player is rushing towards the end he is trying to score with defenders trying to chase him to stop him. I don’t know anything about hockey but I assume this happens when everyone is on one side of the rink and then possession changes all of a sudden (hence why there are no defenders on their goal side and they are chasing the player rushing). Possessions change when there is a giveaway or takeaway event. So, we can speculate that a players shot or goal came off the rush if the previous event was a giveaway or a takeaway and occurred within a few seconds.
Question 5
5.1
Most dangerous type of shot for the offensive team: Wrap-around. It has the lowest shot (521) and goal count (38), as well as the lowest goal percentage (0.06797853). Therefore, it is dangerous for the offensive team to play this type of shot as it seems to be the least successful.
Most dangerous type of shot for the defensive team: Tip-in. It may not have as many shot and goal counts as the other types of shots, but it has the highest goal percentage (0.18019306). Therefore, it is dangerous for the defensive team if the offensive team plays this type of shot as it seems to be the most successful for the offensive team.
Most common type of shot: Wrist Shot. There were over 30000 total shots and goals from all shooters and scores in the 2020-2021 season who used a wrist shot.
5.2
2018

2019

2020
Between the shot distances of 0 feet to 75 feet, from our “Goal Percentage based on Shot Distance in the X-Y Season” graphs, we can see that as shot distance increases, the goal percentage decreases. So, there is generally a negative correlation between shot distance and goal percentage between 0-75 feet. There is also very low shot and goal counts according to our “Shot or Goal Counts based on Shot Distance in the X-Y Season” histograms for shot distances beyond 75 feet, so there is very little sample size. The goal percentage for shots taken beyond 75 feet change very drastically for every shot distance bin. So, goal percentages beyond 75 feet can be mostly ignored. Between the previous 3 seasons, there seems to be very little change in the overall shape of the “Shot or Goal Counts based on Distance” and “Goal Percentage based on Shot Distance” graphs.
5.3

If we associate this figure with our previous “Shot or Goal Counts based on Shot Distance” histograms in question 5.2, we know that shots taken from distances beyond 75 feet should mostly be disregarded due to very low sample size. So, looking just at shot types taken between 0-75 feet, we can sort of see a negative correlation relationship between shot distance and goal percentage among all types of shots. Although, it is not always as clear of a relationship because some curves have jumps. Slap shot (yellow) has the highest goal percentage between 0-30 feet, and has a relatively clear negative correlation. Wrap-around has the lowest goal percentages throughout the 0-75 feet range, which would make it the most dangerous type of shot because it yields the least goals.
Between 0-30 feet, the most dangerous type of shot for the defensive team is the slap shot (since it’s a fast one (/๑•́o•̀๑)/, and the goalie has less time to react). Beyond 30 feet, tip-in and deflected appear to have the highest goal percentages (but they alternate having the highest goal percentages with increasing shot distance), so are the most dangerous shots for the defensive team. The most dangerous type of shot overall for the offensive team is the wrap around. It has below 0.1 goal percentage up to 20 feet and near 0 goal percentage beyond 30 feet.
Question 6
6.1
To TA: we built this interactive graph using Dash, and then deployed it on Heroku. It runs kind of slowly, so please be patient ༼ ༎ຶ ᆺ ༎ຶ༽.
Check the deployed graph
Once you select ‘year’ and ‘team’ from the dropdown, please wait for up to 20 seconds, then you’ll see the fancy graph! ;)
6.2
Firstly, we translated all shots to be shot towards the left post. So, for all shots that were supposed to be shot towards the right side, we rotated those coordinates by 180 degrees. This way, we don’t see a mix of shots taken from the entire rink.
We calculated league average to be the total number of shots and goals among all teams divided by the number of teams per square feet on the rink. We then tallied the number of shots at every location for a given team. The plot then shows the difference between the number of shots and goals a specific team took compared with the league average. So from these plots, we can see how many more shots or how many less shots a team shot from specific locations on the hockey rink compared with the league average for a specific season. We can interpret these plots like a heat map. The darker the shade of pink at a specific location on the ice rink, the more shots the team took compared with the league average. The lighter the shade of pink at a specific location on the ice rink the less shots the team took compared with the league average.
6.3
Season 2016~2017, Team: Colorado Avalanche
Season 2020~2021, Team: Colorado Avalanche
Looking at the Colorado Avalanche (AVA) shot map for the 2016-2017 season, we can see that there are more yellow areas than purple areas near the goal post. So, we can say that they took less shots or got less goals near the goal post than the league average. Otherwise, for the rest of the plot, there are small areas where AVA took more shots than league average (in purple), and less shots than league average (in yellow (you would have to look closely)). It is hard to pinpoint anything unique about any of the rest of the area.
Looking at the AVA shot map for the 2020-2021 season, we can see that there is less yellow near the goal post than the league average and also less yellow than their shot map in 2016-2017. This means they took more shots near the goal post in the 2020-2021 season, so their strategy changed.
Looking at the standings for the 2016-2017 season, AVA was 17/31. In 2020-2021, AVA was 1/31 . This makes sense because from question 5.2, we concluded that shots shot from the least distance yielded the highest goal percentage. Shots near the goal post are shots shot from the least distance. Since AVA shot more shots near the goal post in the 2020-2021 season, they should have scored more goals, which makes sense since they became 1st overall in standings in 2020-2021.
6.4
Season 2018~2019
Team: Buffalo Sabres
Team: Tampa Bay Lightning
Season 2019~2020
Team: Buffalo Sabres
Team: Tampa Bay Lightning
Season 2020~2021
Team: Buffalo Sabres
Team: Tampa Bay Lightning
The shot maps of the Buffalo Sabres (BUF) for the seasons 2018-19, 2019-20, 2020-21 are all very spread out. We can even see shots shot above league average taken from even the far corners (x-coord: <20, y-coord: <10), (>60, <10), (<20, >70), (>60, >70) for the 2018-2019 season.
Comparatively, the Tampa Bay Lightning (TBL) have far less shots taken from those corners. It seems very difficult to score shooting from near the goal line, yet BUF still shoots from there. TBL takes more shots around the centre region across the 3 seasons. We can see that from the TBL centre regions having more purple compared to BUF. This could explain their success in scoring more goals, winning more games, and winning the Stanley Cup as shooting from this region is more direct than locations further away or near the goal line. This does not picture the complete picture that explains the BUF’s struggles or the TBL’s success since shot maps only tell us locations from where shots were taken from.
Other factors that affect a team’s success include their play style, passing, player movement and switches on the ice and goal percentage at each shot location.